Comparison and Combination of Multilayer Perceptrons and Deep Belief Networks in Hybrid Automatic Speech Recognition Systems

نویسندگان

  • Van Hai Do
  • Xiong Xiao
  • Eng Siong Chng
چکیده

To improve the speech recognition performance, many ways to augment or combine HMMs (Hidden Markov Models) with other models to build hybrid architectures have been proposed. The hybrid HMM/ANN (Hidden Markov Model / Artificial Neural Network) architecture is one of the most successful approaches. In this hybrid model, ANNs (which are often multilayer perceptron neural networks MLPs) are used as an HMM-state posterior estimator. Recently, Deep Belief Networks (DBNs) were introduced as a newly powerful machine learning technique. Generally, DBNs are MLPs with many hidden layers, however, while weights of MLPs are often initialized randomly, DBNs use a greedy layer-by-layer pretraining algorithm to initialize the network weights. This pretraining initialization step has resulted in successful realizations of DBNs for various applications such as handwriting recognition, 3-D object recognition, dimensionality reduction and automatic speech recognition (ASR) tasks. To evaluate the effectiveness of the pre-initialization steps that characterize DBNs from MLPs for ASR tasks, we conduct a comparative evaluation between the two systems on phone recognition for the TIMIT database. The effectiveness, advantages and computational cost of each method will be investigated and analyzed. We also show that the information generated by DBNs and MLPs are complementary, where a consistent improvement is observed when the two systems are combined. In addition, we investigate the ability of the hybrid HMM/DBN system in the case only a limited amount of labeled training data is available.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of HMM experts with MLP experts in the full combination multi-band approach to robust ASR

In this paper we apply the Full Combination (FC) multi-band approach, which has originally been introduced in the framework of posterior-based HMM/ANN (Hidden Markov Model/Artificial Neural Network) hybrid systems, to systems in which the ANN (or Multilayer Perceptron (MLP)) is itself replaced by a Multi Gaussian HMM (MGM). Both systems represent the most widely used statistical models for robu...

متن کامل

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...

متن کامل

Learning Temporal Dependencies in Connectionist Speech Recognition

Hybrid connectionistfHMM systems model time both using a Markov chain and through properties of a connectionist network. In this paper, we discuss the nature of the time dependence currently employed in our systems using recurrent networks (RNs) and feed-forward multi-layer perceptrons (MLPs). In particular, we introduce local recurrences into a MLP to produce an enhanced input representation. ...

متن کامل

F0 modeling in HMM-based speech synthesis system using Deep Belief Network

In recent years multilayer perceptrons (MLPs) with many hidden layers Deep Neural Network (DNN) has performed surprisingly well in many speech tasks, i.e. speech recognition, speaker verification, speech synthesis etc. Although in the context of F0 modeling these techniques has not been exploited properly. In this paper, Deep Belief Network (DBN), a class of DNN family has been employed and app...

متن کامل

Exploiting deep neural networks for detection-based speech recognition

In recent years deep neural networks (DNNs) – multilayer perceptrons (MLPs) with many hidden layers – have been successfully applied to several speech tasks, i.e., phoneme recognition, out of vocabulary word detection, confidence measure, etc. In this paper, we show that DNNs can be used to boost the classification accuracy of basic speech units, such as phonetic attributes (phonological featur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011